Search CORE

380 research outputs found

Inference of Markovian Properties of Molecular Sequences from NGS Data and Applications to Comparative Genomics

Author: Cannon Charles H.
Deng Minghua
Reinert Gesine
Ren Jie
Song Kai
Sun Fengzhu
Publication venue
Publication date: 03/04/2015
Field of study

Next Generation Sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential. A plausible model for this underlying distribution of word counts is given through modelling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution, using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate them using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results, and that the clustering results that use a MC of the estimated order give a plausible clustering of the species.Comment: accepted by RECOMB-SEQ 201

arXiv.org e-Print Archive

CiteSeerX

Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing

Author: Cannon Charles H
Wickneswari Ratnam
Wong Melissa ML
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background <it>Acacia auriculiformis </it>× <it>Acacia mangium </it>hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in <it>Acacia </it>hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of <it>A. auriculiformis </it>and <it>A. mangium </it>from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. <it>De novo </it>assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for <it>A. auriculiformis </it>and <it>A. mangium </it>respectively. The assemblies of <it>A. auriculiformis </it>and <it>A. mangium </it>had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168) and one legume-specific family (miR2086). Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs) in the transcriptomes of <it>A. auriculiformis </it>and <it>A. mangium</it>, respectively, thus yielding useful markers for population genetics studies and marker-assisted selection. Conclusion We have produced the first comprehensive transcriptome-wide analysis in <it>A. auriculiformis </it>and <it>A. mangium </it>using <it>de novo </it>assembly techniques. Our high quality and comprehensive assemblies allowed the identification of many genes in the lignin biosynthesis and secondary cell wall formation in <it>Acacia </it>hybrids. Our results demonstrated that Next Generation Sequencing is a cost-effective method for gene discovery, identification of regulatory sequences, and informative markers in a non-model plant.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Evidence for a Trade-Off Strategy in Stone Oak (Lithocarpus) Seeds between Physical and Chemical Defense Highlights Fiber as an Important Antifeedant

Author: Cannon Charles H.
Chen Xi
Conklin-Brittain Nancy Lou
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/02/2014
Field of study

Trees in the beech or oak family (Fagaceae) have a mutualistic relationship with scatter-hoarding rodents. Rodents obtain nutrients and energy by consuming seeds, while providing seed dispersal for the tree by allowing some cached seeds to germinate. Seed predation and caching behavior of rodents is primarily affected by seed size, mechanical protection, macronutrient content, and chemical antifeedants. To enhance seed dispersal, trees must optimize trade-offs in investment between macronutrients and antifeedants. Here, we examine this important chemical balance in the seeds of tropical stone oak species with two substantially different fruit morphologies. These two distinct fruit morphologies in Lithocarpus differ in the degree of mechanical protection of the seed. For ‘acorn’ fruit, a thin exocarp forms a shell around the seed while for ‘enclosed receptacle’ (ER) fruit, the seed is embedded in a woody receptacle. We compared the chemical composition of numerous macronutrient and antifeedant in seeds from several Lithocarpus species, focusing on two pairs of sympatric species with different fruit morphologies. We found that macronutrients, particularly total non-structural carbohydrate, was more concentrated in seeds of ER fruits while antifeedants, primarily fibers, were more concentrated in seeds of acorn fruits. The trade-off in these two major chemical components was more evident between the two sympatric lowland species than between two highland species. Surprisingly, no significant difference in overall tannin concentrations in the seeds was observed between the two fruit morphologies. Instead, the major trade-off between macronutrients and antifeedants involved indigestible fibers. Future studies of this complex mutualism should carefully consider the role of indigestible fibers in the foraging behavior of scatter-hoarding rodents.Human Evolutionary Biolog

Harvard University - DASH

Exploiting sparseness in de novo genome assembly

Author: Cannon Charles H
Ma Zhanshan Sam
Pop Mihai
Ye Chengxi
Yu Douglas W
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. Methods: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k- mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. Results: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k- mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

University of East Anglia digital repository

Where Have the Beans Been? Student-Driven Laboratory Learning Activities with Legumes for Conceptual Change

Author: Cannon Charles
Cherif Abour H.
Gialamas Stefanos
Kassem Sana
Movahedzadeh Farahnaz
Siuda JoElla E.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 02/05/2018
Field of study

Accessible, familiar, relevant, effective and expansive teaching and learning resources is the dream of every teacher and educator throughout all types of educational systems. Furthermore, engaging students in meaningful scientific investigations using familiar objects inspire students to make the needed connection with the science concept being introduced. Actively engaging in solving problems, and arriving at empirically based conclusions, leads to a lasting effect on students’ learning; what is more, a deep appreciation of science and the real understanding of the scientific process is fostered. In this paper, we provide a set of laboratory-based activities using a variety of edible legumes (beans, peas, lentils, etc.) to introduce students to various STEM concepts in integrated, empirical investigations. Legumes have been grown throughout the world, and have been cultivated since ancient times for more than 11,000 years. The seeds of legumes come in a wide variety of shapes, sizes, colors, and are known for their differing nutritional values based on their content. But most of all, they are accessible, familiar, real and relevant, and are limitless in terms of locales where they can be found. It is precisely these reason that make them an effective teaching and learning resource in the laboratory classroom settings. Throughout all these laboratory learning activities, students engage in hands-on experiments, conducting research, engage in productive discussion, write scientific papers, and present their findings within a scientific framework. Through these set of inquiry activities, teachers and students will never look at beans in the same way again. Perhaps in fact, teachers may even consider them as one of their best teaching and learning resources. Finally, the appendix section offers more ideas that support the teachers whom is introducing these scientific concepts with the use of legumes. We include additional ideas, information, activities, and questions (complete with answers) that we feel students may ask during the learning process. In doing so, we aim to save time and energy for those teachers who wish to use and/or adapt the suggested laboratory learning activities as a means of introducing conceptual changes. Keywords: Legumes, Science Inquiry, Laboratory experiments, Learning science, Effective learning resources.

International Institute for Science, Technology and Education (IISTE): E-Journals

Warm Dust and Spatially Variable PAH Emission in the Dwarf Starburst Galaxy NGC 1705

Author: Aigen Li
Brent A. Buckalew
Bruce T. Draine
Charles W. Engelbracht
Claus Leitherer
Daniel A. Dale
Daniela Calzetti
David J. Hollenbach
de Jong T.
Eric J. Murphy
Fabian Walter
George H. Rieke
George Helou
George J. Bendo
Haas M.
Helene Roussel
John M. Cannon
John‐David T. Smith
Karl D. Gordon
Kartik Sheth
Lee Armus
Lemke D.
Marcia J. Rieke
Martin J. Meyer
Michael W. Regan
Michele D. Thornley
Robert C. Kennicutt Jr.
Stanimirovic S.
Thomas H. Jarrett
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2006
Field of study

We present Spitzer observations of the dwarf starburst galaxy NGC 1705 obtained as part of SINGS. The galaxy morphology is very different shortward and longward of ~5 microns: short-wavelength imaging shows an underlying red stellar population, with the central super star cluster (SSC) dominating the luminosity; longer-wavelength data reveals warm dust emission arising from two off-nuclear regions offset by ~250 pc from the SSC. These regions show little extinction at optical wavelengths. The galaxy has a relatively low global dust mass (~2E5 solar masses, implying a global dust-to-gas mass ratio ~2--4 times lower than the Milky Way average). The off-nuclear dust emission appears to be powered by photons from the same stellar population responsible for the excitation of the observed H Alpha emission; these photons are unassociated with the SSC (though a contribution from embedded sources to the IR luminosity of the off-nuclear regions cannot be ruled out). Low-resolution IRS spectroscopy shows moderate-strength PAH emission in the 11.3 micron band in the eastern peak; no PAH emission is detected in the SSC or the western dust emission complex. There is significant diffuse 8 micron emission after scaling and subtracting shorter wavelength data; the spatially variable PAH emission strengths revealed by the IRS data suggest caution in the interpretation of diffuse 8 micron emission as arising from PAH carriers alone. The metallicity of NGC 1705 falls at the transition level of 35% solar found by Engelbracht and collaborators; the fact that a system at this metallicity shows spatially variable PAH emission demonstrates the complexity of interpreting diffuse 8 micron emission. A radio continuum non-detection, NGC 1705 deviates significantly from the canonical far-IR vs. radio correlation. (Abridged)Comment: ApJ, in press; please retrieve full-resolution version from http://www.astro.wesleyan.edu/~cannon/pubs.htm

arXiv.org e-Print Archive

Crossref

The Nature of Infrared Emission in the Local Group Dwarf Galaxy NGC 6822 As Revealed by Spitzer

Author: Bianchi S.
Brent A. Buckalew
Bruce T. Draine
Caplan J.
Caroline Bot
Charles W. Engelbracht
Claus Leitherer
Daniel A. Dale
Daniela Calzetti
David J. Hollenbach
de Jong T.
Eric J. Murphy
Fabian Walter
George Helou
George J. Bendo
Helene Roussel
Hildebrand R. H.
Israel F. P.
Israel F. P.
John M. Cannon
Karl D. Gordon
Kartik Sheth
Klein U.
Klein U.
Lee Armus
Martin J. Meyer
Michele D. Thornley
Robert C. Kennicutt Jr.
Skillman E. D.
Thomas H. Jarrett
W. J. G. de Blok
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2006
Field of study

We present Spitzer imaging of the metal-deficient (Z ~30% Z_sun) Local Group dwarf galaxy NGC 6822. On spatial scales of ~130 pc, we study the nature of IR, H alpha, HI, and radio continuum emission. Nebular emission strength correlates with IR surface brightness; however, roughly half of the IR emission is associated with diffuse regions not luminous at H alpha (as found in previous studies). The global ratio of dust to HI gas in the ISM, while uncertain at the factor of ~2 level, is ~25 times lower than the global values derived for spiral galaxies using similar modeling techniques; localized ratios of dust to HI gas are about a factor of five higher than the global value in NGC 6822. There are strong variations (factors of ~10) in the relative ratios of H alpha and IR flux throughout the central disk; the low dust content of NGC 6822 is likely responsible for the different H alpha/IR ratios compared to those found in more metal-rich environments. The H alpha and IR emission is associated with high-column density (> ~1E21 cm^-2) neutral gas. Increases in IR surface brightness appear to be affected by both increased radiation field strength and increased local gas density. Individual regions and the galaxy as a whole fall within the observed scatter of recent high-resolution studies of the radio-far IR correlation in nearby spiral galaxies; this is likely the result of depleted radio and far-IR emission strengths in the ISM of this dwarf galaxy.Comment: ApJ, in press; please retrieve full-resolution version from http://www.astro.wesleyan.edu/~cannon/pubs.htm

arXiv.org e-Print Archive

Crossref

The Australian National University

CERN Document Server

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

Author: Adkins Joshua N
Ansong Charles
Cannon William R
Jensen Jeffrey L
Kobold Markus A
McCue Lee Ann
Payne Samuel H
Peterson Elena S
Schrimpe-Rutledge Alexandra C
Walker Hyunjoo
Webb Samantha R
Webb-Robertson Bobbie-Jo M
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (<it>Yersinia pestis </it>Pestoides F and <it>Synechococcus </it>sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at <url>https://www.biopilot.org/docs/Software/Vespa.php</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central